66 research outputs found
Factor oracle : a new structure for pattern matching
International audienceWe introduce a new automaton on a word p, sequence of letters taken in an alphabet Σ, that we call factor oracle. This automaton is acyclic, recognizes at least the factors of p, has m+1 states and a linear number of transitions. We give an on-line construction to build it. We use this new structure in string matching algorithms that we conjecture optimal according to the experimental results. These algorithms are as effecient as the ones that already exist using less memory and being more easy to implement
Large-scale Language Model Rescoring on Long-form Data
In this work, we study the impact of Large-scale Language Models (LLM) on
Automated Speech Recognition (ASR) of YouTube videos, which we use as a source
for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error
Eate (WER) on US English (en-us) and code-switched Indian English (en-in)
long-form ASR test sets and a reduction of up to 30\% relative on Salient Term
Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy
based language model. Improved lattice processing that results in a lattice
with a proper (non-tree) digraph topology and carrying context from the 1-best
hypothesis of the previous segment(s) results in significant wins in rescoring
with LLMs. We also find that the gains in performance from the combination of
LLMs trained on vast quantities of available data (such as C4) and conventional
neural LMs is additive and significantly outperforms a strong first-pass
baseline with a maximum entropy LM.
Copyright 2023 IEEE. Personal use of this material is permitted. Permission
from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other
works.Comment: 5 pages, accepted in ICASSP 202
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model
We explore unifying a neural segmenter with two-pass cascaded encoder ASR
into a single model. A key challenge is allowing the segmenter (which runs in
real-time, synchronously with the decoder) to finalize the 2nd pass (which runs
900 ms behind real-time) without introducing user-perceived latency or deletion
errors during inference. We propose a design where the neural segmenter is
integrated with the causal 1st pass decoder to emit a end-of-segment (EOS)
signal in real-time. The EOS signal is then used to finalize the non-causal 2nd
pass. We experiment with different ways to finalize the 2nd pass, and find that
a novel dummy frame injection strategy allows for simultaneous high quality 2nd
pass results and low finalization latency. On a real-world long-form captioning
task (YouTube), we achieve 2.4% relative WER and 140 ms EOS latency gains over
a baseline VAD-based segmenter with the same cascaded encoder
Pushdown automata in statistical machine translation
This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p
- …